Search CORE

15 research outputs found

Watermarking Technique for Multimedia Documents in the Frequency Domain

Author: Bellaaj Maha
Ouni Kaïs
Publication venue: 'IntechOpen'
Publication date: 30/04/2019
Field of study

In order to secure and maintain the authenticity and integrity of multimedia documents, we use digital watermarking. This discipline can be applied to images, audios, and videos. For this reason, and to be independent of the nature of the signal composing the document to be watermarked, we will propose in this chapter two watermarking techniques, one for the audio and another for the image to watermark a video containing the two components audio and image. MDCT is combined with Watson model and a motion detection algorithm in the image watermarking technique and is combined with a psychoacoustic model to elaborate the audio watermarking technique. For the two techniques, the bits of the mark will be duplicated to increase the capacity of insertion and then inserted into the least significant bit (LSB). We will use an error correction code (Hamming) on the mark for more reliability in the detection phase. To highlight our experimental results point of view robustness and imperceptibility, we will compare the proposed techniques with some other existing techniques

IntechOpen

Crossref

Using Hidden Markov Models for ECG Characterisation

Author: Ellouze Noureddine
Krimi Samar
Ouni Kaïs
Publication venue: 'IntechOpen'
Publication date: 19/04/2011
Field of study

IntechOpen

Data-Efficient Domain Adaptation for Semantic Segmentation of Aerial Imagery Using Generative Adversarial Networks

Author: Ammar Adel
Ben Jdira Bilel
Koubaa Anis
Ouni Kaïs
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Despite the significant advances noted in semantic segmentation of aerial imagery, a considerable limitation is blocking its adoption in real cases. If we test a segmentation model on a new area that is not included in its initial training set, accuracy will decrease remarkably. This is caused by the domain shift between the new targeted domain and the source domain used to train the model. In this paper, we addressed this challenge and proposed a new algorithm that uses Generative Adversarial Networks (GAN) architecture to minimize the domain shift and increase the ability of the model to work on new targeted domains. The proposed GAN architecture contains two GAN networks. The first GAN network converts the chosen image from the target domain into a semantic label. The second GAN network converts this generated semantic label into an image that belongs to the source domain but conserves the semantic map of the target image. This resulting image will be used by the semantic segmentation model to generate a better semantic label of the first chosen image. Our algorithm is tested on the ISPRS semantic segmentation dataset and improved the global accuracy by a margin up to 24% when passing from Potsdam domain to Vaihingen domain. This margin can be increased by addition of other labeled data from the target domain. To minimize the cost of supervision in the translation process, we proposed a methodology to use these labeled data efficiently.info:eu-repo/semantics/publishedVersio

Repositório Científico do Instituto Politécnico do Porto

Enhancement of esophageal speech using voice conversion techniques

Author: Ben Othmane Imen
Di Martino Joseph
Ouni Kaïs
Publication venue: HAL CCSD
Publication date: 05/12/2017
Field of study

International audienceThis paper presents a novel approach for enhancing esophageal speech using voice conversion techniques. Esophageal speech (ES) is an alternative voice that allows a patient with no vocal cords to produce sounds after total laryngectomy: this voice has a poor degree of intelligibility and a poor quality. To address this issue, we propose a speaking-aid system enhancing ES in order to clarify and make it more natural. Given the specificity of ES, in this study we propose to apply a new voice conversion technique taking into account the particularity of the pathological vocal apparatus. We trained deep neural networks (DNNs) and Gaussian mixture models (GMMs) to predict " laryngeal " vocal tract features from esophageal speech. The converted vectors are then used to estimate the excitation cepstral coefficients and phase by a search in the target training space previously encoded as a binary tree. The voice resynthesized sounds like a laryngeal voice i.e., is more natural than the original ES, with an effective reconstruction of the prosodic information while retaining , and this is the highlight of our study, the characteristics of the vocal tract inherent to the source speaker. The results of voice conversion evaluated using objective and subjective experiments , validate the proposed approach

INRIA a CCSD electronic archive server

Vers la transformation de la parole oesophagienne en voix laryngée à l'aide de techniques de conversion vocale

Author: Ben Othmane Imen,
Di Martino Joseph
Ouni Kaïs
Publication venue: HAL CCSD
Publication date: 29/06/2017
Field of study

National audienceCe travail concerne le développement d'un système de conversion de voix oesophagienne dans le but est de rendre plus intelligible celle-ci. La conversion de voix est une technique de transformation d'un signal de parole d'un locuteur source, de manière à ce qu'il semble, à l'écoute, être prononcé par un locuteur cible. Etant donnée la spécificité de la voix oesophagienne, nous proposons dans cette étude d'appliquer une nouvelle technique de conversion vocale en tenant compte de la particularité de l'appareil vocal des patients qui ont subi une ablation de larynx. En effet, l'ablation des cordes vocales perturbe profondément le signal glottique et par conséquent la voix oesophagienne acquise par le patient laryngectomisé est difficile à comprendre, rauque et faible en intensité. Dans la littérature, plusieurs techniques de conversion des voix ont été proposées, parmi lesquelles, la technique du codage linéaire prédictif pour la conversion vocale [1] et la régression linéaire multi-variée [2] qui vise à réduire la discontinuité et la distorsion spectrale

INRIA a CCSD electronic archive server

Suivi de formants par analyse en multirésolution

Author: HATON Jean-Paul
JEMAA Imen
OUNI Kaïs
Publication venue
Publication date: 01/01/2013
Field of study

Nos travaux de recherches présentés dans ce manuscrit ont pour objectif, l'optimisation des performances des algorithmes de suivi des formants. Pour ce faire, nous avons commencé par l'analyse des différentes techniques existantes utilisées dans le suivi automatique des formants. Cette analyse nous a permis de constater que l'estimation automatique des formants reste délicate malgré l'emploi de diverses techniques complexes. Vue la non disponibilité des bases de données de référence en langue arabe, nous avons élaboré un corpus phonétiquement équilibré en langue arabe tout en élaborant un étiquetage manuel phonétique et formantique. Ensuite, nous avons présenté nos deux nouvelles approches de suivi de formants dont la première est basée sur l'estimation des crêtes de Fourier (maxima de spectrogramme) ou des crêtes d'ondelettes (maxima de scalogramme) en utilisant comme contrainte de suivi le calcul de centre de gravité de la combinaison des fréquences candidates pour chaque formant, tandis que la deuxième approche de suivi est basée sur la programmation dynamique combinée avec le filtrage de Kalman. Finalement, nous avons fait une étude exploratrice en utilisant notre corpus étiqueté manuellement comme référence pour évaluer quantitativement nos deux nouvelles approches par rapport à d'autres méthodes automatiques de suivi de formants. Nous avons testé la première approche par détection des crêtes ondelette, utilisant le calcul de centre de gravité, sur des signaux synthétiques ensuite sur des signaux réels de notre corpus étiqueté en testant trois types d'ondelettes complexes (CMOR, SHAN et FBSP). Suite à ces différents tests, il apparaît que le suivi de formants et la résolution des scalogrammes donnés par les ondelettes CMOR et FBSP sont meilleurs qu'avec l'ondelette SHAN. Afin d'évaluer quantitativement nos deux approches, nous avons calculé la différence moyenne absolue et l'écart type normalisée. Nous avons fait plusieurs tests avec différents locuteurs (masculins et féminins) sur les différentes voyelles longues et courtes et la parole continue en prenant les signaux étiquetés issus de la base élaborée comme référence. Les résultats de suivi ont été ensuite comparés à ceux de la méthode par crêtes de Fourier en utilisant le calcul de centre de gravité, de l'analyse LPC combinée à des bancs de filtres de Mustafa Kamran et de l'analyse LPC dans le logiciel Praat. D'après les résultats obtenus sur les voyelles /a/ et /A/, nous avons constaté que le suivi fait par la méthode ondelette avec CMOR est globalement meilleur que celui des autres méthodes Praat et Fourier. Cette méthode donne donc un suivi de formants (F1, F2 et F3) pertinent et plus proche de suivi référence. Les résultats des méthodes Fourier et ondelette sont très proches dans certains cas puisque toutes les deux présentent moins d'erreurs que la méthode Praat pour les cinq locuteurs masculins ce qui n'est pas le cas pour les autres voyelles où il y a des erreurs qui se présentent parfois sur F2 et parfois sur F3. D'après les résultats obtenus sur la parole continue, nous avons constaté que dans le cas des locuteurs masculins, les résultats des deux nouvelles approches sont notamment meilleurs que ceux de la méthode LPC de Mustafa Kamran et ceux de Praat même si elles présentent souvent quelques erreurs sur F3. Elles sont aussi très proches de la méthode par détection de crêtes de Fourier utilisant le calcul de centre de gravité. Les résultats obtenus dans le cas des locutrices féminins confirment la tendance observée sur les locuteursOur research work presented in this thesis aims the optimization of the performance of formant tracking algorithms. We began by analyzing different existing techniques used in the automatic formant tracking. This analysis showed that the automatic formant estimation remains difficult despite the use of complex techniques. For the non-availability of database as reference in Arabic, we have developed a phonetically balanced corpus in Arabic while developing a manual phonetic and formant tracking labeling. Then we presented our two new automatic formant tracking approaches which are based on the estimation of Fourier ridges (local maxima of spectrogram) or wavelet ridges (local maxima of scalogram) using as a tracking constraint the calculation of center of gravity of a set of candidate frequencies for each formant, while the second tracking approach is based on dynamic programming combined with Kalman filtering. Finally, we made an exploratory study using manually labeled corpus as a reference to quantify our two new approaches compared to other automatic formant tracking methods. We tested the first approach based on wavelet ridges detection, using the calculation of the center of gravity on synthetic signals and then on real signals issued from our database by testing three types of complex wavelets (CMOR, SHAN and FBSP). Following these tests, it appears that formant tracking and scalogram resolution given by CMOR and FBSP wavelets are better than the SHAN wavelet. To quantitatively evaluate our two approaches, we calculated the absolute difference average and standard deviation. We made several tests with different speakers (male and female) on various long and short vowels and continuous speech signals issued from our database using it as a reference. The formant tracking results are compared to those of Fourier ridges method calculating the center of gravity, LPC analysis combined with filter banks method of Kamran.M and LPC analysis integrated in Praat software. According to the results of the vowels / a / and / A /, we found that formant tracking by the method with wavelet CMOR is generally better than other methods. Therefore, this method provides a correct formant tracking (F1, F2 and F3) and closer to the reference. The results of Fourier and wavelet methods are very similar in some cases since both have fewer errors than the method Praat. These results are proven for the five male speakers which is not the case for the other vowels where there are some errors which are present sometimes in F2 and sometimes in F3. According to the results obtained on continuous speech, we found that in the case of male speakers, the result of both approaches are particularly better than those of Kamran.M method and those of Praat even if they are often few errors in F3. They are also very close to the Fourier ridges method using the calculation of center of gravity. The results obtained in the case of female speakers confirm the trend observed over the male speakersMETZ-SCD (574632105) / SudocNANCY1-Bib. numérique (543959902) / SudocNANCY2-Bibliotheque electronique (543959901) / SudocNANCY-INPL-Bib. électronique (545479901) / SudocSudocFranceF

OpenGrey Repository

Improving the computational performance of standard GMM-based voice conversion systems used in real-time applications

Author: Ben Othmane Imen
Di Martino Joseph
Ouni Kaïs
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/12/2018
Field of study

International audienceVoice conversion (VC) can be described as ﬁnding a mapping function which transforms the features extracted from a source speaker to those of a target speaker. Gaussian mixture model (GMM) based conversion is the most commonly used technique in VC, but is often sensitive to overﬁtting and oversmoothing. To address these issues, we propose a secondary classiﬁcation by applying a K-means classiﬁcation in each class obtained by a primary classiﬁcation in order to obtain more precise local conversion functions. This proposal avoids the need for complex training algorithms because the local mapping functions are determined at the same time. The proposed approach consists of a Fourier cepstral analysis, followed by a training phase in order to ﬁnd the local mapping functions which transform the vocal tract characteristics of the source speaker into those of the target speaker. The converted parameters together with excitation and phase extracted from the target training space using a frame index selection are used in the synthesis step to generate a converted speech with target speech characteristics. Objective and subjective experiments prove that the proposed technique outperforms the baseline GMM approach while greatly reducing the training and transformation computation times

Crossref

INRIA a CCSD electronic archive server